Information retrieval strategies for accessing african audio corpora

نویسندگان

  • Abdillahi Nimaan
  • Pascal Nocera
  • Frédéric Béchet
  • Jean-François Bonastre
چکیده

In this paper we present a first approach to access African oral corpora, combining automatic speech recognition and information retrieval. Firstly, we present the principal characteristics of our Somali speech recognizer [8] and the results obtained on real audio archives gathered from Djibouti Radio. Secondly, we present a Hybrid Language Model (HLM) including words and sub-words to improve the robustness against OOV words. We proceed to Information Retrieval experiments with various strategies. We search on the different outputs of the ASR system (words, sub-words and hybrid). We finally present a new strategy combining sub-words and words to enhance the information retrieval results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

دیداری کردن نتایج جست‌وجو در فرایند بازیابی اطلاعات

Purpose: One of the most effective ways to achieve optimum information retrieval is through visualization of Information. Search strategies, probing skills, querying of information needs and analysis of information play a significant role in the accessing of necessary and useful information. Besides the factors mentioned above, information visualization can increase the availability level of in...

متن کامل

Bikers Accessing the Web: The SmartWeb Motorbike Corpus

Three advanced German speech corpora have been collected during the German SmartWeb project. One of them, the SmartWeb Motorbike Corpus (SMC) is described in this paper. As with all SmartWeb speech corpora (e.g. (M ögele et al., 2006)) SMC is designed for a dialogue system dealing with open domains. The corpus is recorded under the special circumstances of a motorbike ride and contains utteranc...

متن کامل

Speech Recognition and Information Retrieval: Experiments in Retrieving Spoken Documents

The Informedia Digital Video Library Project at Carnegie Mellon University is making large corpora of video and audio data available for full content retrieval by integrating natural language understanding, image processing, speech recognition and information retrieval. Information retrieval of from corpora of speech recognition output is critical to the project’s success. In this paper, we out...

متن کامل

On the applicability of advanced information retrieval techniques for a database of African music

Music information retrieval from an archive of audio recordings of African music poses many challenges for research. Apart from audio feature extraction and audio content description, there is a need for techniques that allow flexible database querying. These techniques have to be developed in relation to a careful cultural, social and economical analysis of the user profile and the general con...

متن کامل

Modular System Design for Multimedial Information Handling

Often, information retrieval from various other media is analogous to text-based retrieval; however , accessing documents in e.g. audio or video formats causes some extra problems, in particular with respect to document segmentation, choice of indexing features, and robustness. We review these diiculties, together with some previous attempts to overcome them, and then describe a very exible, mo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007